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Abstract 

An accurate impact parameter determination in a heavy ion collision 
is crucial for almost all further analysis. The capabilities of an artificial 
neural network are investigated to that respect. A novel input gener- 
ation for the network is proposed, namely the transverse and longitu- 
dinal momentum distribution of all outgoing (or actually detectable) 
particles. The neural network approach yields an improvement in per- 
formance of a factor of two as compared to classical techniques. To 
achieve this improvement simple network architectures and a 5 x 5 in- 
put grid in {pt,Pz) space are sufficient. 
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The physics of relativistic heavy ion coUisions is motivated by the unique oppor- 
tunity to study the properties of hot and dense nuclear matter JI]-^ . 

For a detailed investigation of heavy ion collisions a proper event characterization 
is mandatory. In particular, the impact parameter, though not directly accessible 
experimentally, is among the most important characteristics for the description of 
the event geometry and selection. 

For the investigation of highly compressed nuclear matter, it is important to select 
the most central collisions. On the other hand, recently discovered new phenomena 
such as pionic bounce-off and squeeze-out are only observed in semi-peripheral col- 
lisions. 

There have been various proposals how to determine the impact parameter. Most 
of these are based on the mean particle multiplicity, the ratio of transverse to lon- 
gitudinal energy deposition, a transverse momentum analysis (directivity- cut) or on 
a combination of these techniques. However, all previously mentioned methods have 
one thing in common: They tend to break down for very central collisions and are 
generally optimized for a certain impact parameter range. The achievable accuracy 
is estimated to be at best ±1 to ±1.5 fm. 

Recently, neural networks have been suggested as tools for impact parameter 



determination ||TO| , p!T|| . While offering an improvement of approximately 50 % as 
compared with "classical" techniques, their main shortcoming was the large amount 
of preprocessing neccessary for the selected input. Essentially, the proposed net- 
works were used as multidimensional nonlinear fits to combine the (already known) 
impact-parameter dependence of the various input observables (a combination of 
three observables has been used) into a single function yielding the impact parame- 
ter. 

The use of preprocessed input has the advantage of reducing the amount of input 
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data (and therefore the computing time for the network output). It's great disad- 
vantage, however, is that only those correlations and informations preselected for the 
input can be accessed and optimized by the network. Previously unknown corre- 
lations between input observables and the desired output may be destroyed or left 
out through the preprocessing. Minimizing the amount of preprocessing allows for 
taking advantage of the full capabilities of the neural network. In addition, common 
preprocessing techniques are computationally expensive. 

Those heavy ion collision events which are used as input for training and analyzing 
the network's performance have to be supplied by a theoretical model rather than by 
experiment (otherwise it would be impossible to compare the network output with a 
target value for the impact parameter). 

For the present study an extension of the Quantum Molecular Dynamics model 
(QMD) |[r2|-p!^ has been applied. It explicitly incorporates isospin and pion pro- 
duction via the delta resonance (IQMD) p6|-p!8|. In the QMD model the nucleons 
are represented by Gaussian shaped density distributions, giving the nuclei a surface 
thickness of 1.5 fm. The initial momenta are randomly chosen between and the 
local Thomas- Fermi-momentum. The Ap projectile and At target nucleons interact 
via two- and three-body Skyrme forces, a Yukawa potential, momentum dependent 
interactions, a symmetry potential (to achieve a correct distribution of protons and 
neutrons in the nucleus) and explicit Coulomb forces between the Zp and Zt pro- 
tons. They are propagated according to Hamilton's equations of motion. Hard N-N- 
coUisions are included by employing the collision term of the well known VUU/BUU 
model P,p!9|-p^. The collisions are done stochastically, in a similar way as in the 
cascade models |2^J2^. In addition, the Pauli blocking (for the final state) is taken 
into account by regarding the phase space densities in the final states of a two-body 
collision. Clusters (e.g. deuterons and tritons) are formed via a configuration space 
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coalescence model. 

In order to minimize preprocessing and to allow the network to use "unknown 
correlations" the longitudinal and transverse momenta per nucleon of all baryons in 
the system were choosen as input. The left column of figure shows the respective 
final state momentum distributions for impact parameters of 1, 5, 9 and 11 fm. In 
contrast to the full Att information accessible via a QMD simulation, experiments 
usually have a limited acceptance. The right column of figure |l| shows the same 
momentum distributions as the left column, this time processed with the detector 
filter [^] of the FOPI experiment pS) at GSI. The filter does not only destroy the 
symmetry to the Pz = axis, but it also reduces drastically the available phase 
space information. The suggested neural algorithms should prove their usefulness 
both for the "ideal" An dataset and for the more "realistic" filtered set. A simplistic 
extrapolation of the network performance (from the "ideal" set to the "realistic" 
one), as performed for the preprocessed input of IMF-multipicity, Px,dir and ERAT 
|1T|, is clearly not justified. 
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The event data have been transformed into a constant number of input compo- 
nents to be used with neural network. In this paper a new direct mapping of the 
momentum-state distribution to the network's input is proposed. The momentum 
distributions is discretized in a two-dimensional grid. For each of the (3 x 3 to 20 x 20, 
equally sized) momentum bins, the corresponding number of "hits" is calculated and 
used as one input component. 

We now sketch the neural network algorithm used, a standard feed-forward two- 
layer perceptron trained by error-backpropagation [^,^. The network consists of 
a "hidden" layer of up to 20 nonlinear units receiving inputs from the applied data 
vector and transferring their signals to the output unit (a single linear output unit 
is used, whose continuous valued output represents the impact parameter). 
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Every (hidden or output) unit performs a weighted sum over all input signals. 
The hidden units calculate their own signals by applying a nonlinear "squashing" 
function a{x) to the result: 



\ fc 



As a squashing function a{x) = tanh(x) is used, is a component of the data 
vector. A connection weight (from input component k to hidden unit j) is given by 
w^^. The output unit is linear: 



out „.hid 
j 
j 



For each unit, a connection to a constant signal, yo = 1, is included which provides 
an activity threshold. 

First, the network's weights are initialized with small random values. For each 
learning pattern an output is produced during training. It is rated by the error 
function 

E = I (<5-f , 

where 

with as the desired output. Successively for each pattern, the weights are 

updated according to a gradient descent in the weight space with respect to the error 
function. 

Aw?-' = Aw]^ = -e 
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with £ > as learning rate. This leads to the learning rules 



and 
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The weight update values Aw are added to the corresponding weights directly after 
each presentation of a learning sample (incremental learning). 

The event data generated by the theoretical model were divided into three sets of 
equal size. The first set was used as training data, the second provided a criterion for 
stopping the learning process (in order to avoid overfitting), whereas the third one was 
used to determine the generalization ability of the system. Since there were virtually 
no limitations to the number of events available, large sets (1000 events each) could be 
used to reduce fluctuation of performance. For a complete learning session, typically 
several hundred cycles through the entire training data were necessary. Training was 
stopped when the performance on the criterion data set was best. 

In contrast to the extensive training phase, very little calculation time is needed 
for the application of a trained network, so that it could be integrated into the data 
analysis process itself. 

Figure § shows the effects of the chosen resolution (the number of input compo- 
nents) on the selected input for three impact parameter bins (central, medium and 
peripheral impact parameters). When increasing the resolution in order to improve 
the informational contents of the input the following points have to be taken into 
account: 
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• The computational time rises quadratically with the input grid dimension and 
therefore the processing time poses a constraint for the resolution. 

• The increase in information through an increased resolution does not necessarily 
translate into an improved network performance (see figure 1). 

Sample results can be seen in figure |^. The true impact parameter is plotted 
versus the reconstructed impact parameter for full events (upper frame) and events 
processed with the filter of the FOPI experiment (lower frame). Each dot rep- 
resents one event. Both calculations have been performed with a 10 x 10 input grid, 
10 hidden units, and one output unit. In the lower frame the boundaries in pt and pz 
have been adapted to the acceptance range of the detector filter. In both cases the 
quality of reconstruction remains constant for the whole impact parameter range. 

The effect of the detector filter can be seen in the slight broadening of the distribu- 
tion. Although the symmetry along the Pz = axis has been broken and phase space 
severely reduced, the network at first glance seems to suffer only minor performance 
losses. A detailed quantitative analysis (see below) shows, however, that the network 
performance is diminished by 30 % due to the detector filter. In view of the severe 
phase space limitations caused by the detector cut the network performance on the 
filtered sample is nevertheless remarkable. 

Test calculations using three-dimensional {px,Py,Pz) information as input rather 
than the reduced {pt,Pz) set, did not yield a significant performance improvement. On 
the contrary, the use of cartesian momenta is not feasible: For the proper definition of 
the X and y directions the reaction plane has to be determined. Experimentally this is 
a complicated task which can only be achieved within an uncertainty of approximately 
30 degrees ^1 . 



The performance of the network can be quantified by computing the average 
difference between the reconstructed and the true impact parameter: 
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In the first test case (10 x 10 input grid, 10 hidden units) the network achieved 
A b = 0.25 fm for the unfiltered and A b = 0.35 fm for the filtered input. 

How does the performance vary with the dimension of the input grid and the 
number of hidden units? In figure ^) the performance Ab is plotted versus the input 
grid dimension (the square root of the number of input components) for the unfiltered 
and filtered data with and without adaption of the input boundaries to the available 
phase space. In the region of optimum network performance this adaption has little 
effect; only for very small input grid dimensions is a large effect (a performance gain 
by 40%) obtained. 

Far more important is the input grid dimension itself. In the limit of a (2 x 2) 
input grid one reduces the input basically to the ratio of transverse to longitudinal 
degrees of freedom. This corresponds to the well known directivity criterion for 
impact parameter selection. Increasing the input grid dimension from 2 to 5 improves 
the performance by a factor of up to two. This indicates that there are correlations 
present in the {pz,Pt) distribution of the event, which are sensitive to the impact 
parameter and are not covered by the directivity observable or any other related ratio 
of transverse to longitudinal degrees of freedom. However, no significant performance 
gain is obtained when increasing the input grid dimension beyond 10. This finding 
is important, since it shows that a further increase in information presented to the 
network (via a larger input grid) does not lead to a better reconstructive power for the 
impact parameter. Therefore one can conclude that all {pz,Pt) correlations useful for 
the impact parameter reconstruction can be contained in a 5 x 5 grid of equidistant 
bins covering the relevant (detectable) momentum space area. The performance loss 
due to the detector filter remains constant for dimensions larger than 4 and increases 
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for those smaller than 4. 

In figure Qd) the performance Ab is again plotted versus the input grid dimension. 
Filtered input with asymmetric binning was used. Two calculations are shown, one 
with 10 hidden units and one without any hidden units, where the input components 
were connected directly to the output unit (also called simple perceptron). In the 
latter case the learning rule becomes particularly simple, as the input components 
replace the hidden units. 

For input grid dimensions larger than 5, the number of hidden units seems to 
be totally irrelevant, so that the simple perceptron suffices. In essence the task 
for the network appears to be a linear one. This simple network configuration has 
the great advantage that one can study its working mechanisms in great detail: 
The reconstructed impact parameter is obtained essentially by computing the scalar 
product between the input vector (a one-dimensional representation of the 10 x 10 
input grid) and a weight vector. An "optimal" set of weights minimizing the average 
error for all events of the training set could in fact be determined via calculation 
and inversion of a Hessian matrix. A numerical test confirmed this. For nonlinear 
systems with hidden units, however, grossly deteriorated generalization capabilities 
would be expected. For better comparability, the learning process was therefore used 
in all cases. 

In figure |^ the weight vector of a (10 x 10) network without hidden units is 
mapped onto the representation of the input grid as displayed in figure ^ third row. 
Frame a) shows the distribution for unfiltered input, whereas frame b) shows the 
distribution for the filtered input. The distribution of weights correlates directly 
to the presented physical input. Areas in the {pz,Pt) plane associated with small 
impact parameters are assigned negative weights (contour lines A-G), whereas those 
associated with large impact parameters carry positive weights (contour lines H-L). 
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This specific distribution becomes intuitively clear if one remembers that the scalar 
product of input vector and weight vector is simply the sum of the number of hits 
in each momentum bin multiplied by their corresponding weights. Since the result 
of this sum is the reconstructed impact parameter, hits in peripheral areas must 
contribute large positive values whereas hits in central areas must compensate with 
negative values. 

For the application of the proposed technique on experimental data the neural 
network needs to be trained on a data-set with known impact parameters, i.e. a 
data-set generated by a transport model. This model must be able to accurately 
reproduce the final state of the heavy ion collision, most importantly the particle 
and fragment multiplicities and the respective double differential cross sections in 
{pt,Pz) space. This procedure, however, creates a model bias. It's magnitude can 
be estimated by generating two sets of weights from two different transport models. 
Then the network can be applied with both sets on the same experimental data and 
the difference in the reconstructed impact parmeters yields an estimate on the model 
bias involved. 

In summary, we have applied neural network techniques to the problem of impact 
parameter determination in relativistic heavy ion collisions. As input the transverse 
and longitudinal momentum distributions of all outgoing (or detectable) particles 
were discretized in two-dimensional grids of varying size. The neural network ap- 
proach yields an improvement of a factor of two as compared with classical techniques. 
All information necessary to achieve this improvement is present in a 5 x 5 grid in 
{pt,Pz) space. Simple network algorithms suffice, hidden units are not necessary. 
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FIG. 1. p^dp^dp (^^^ momenta per nucleon) for nucleons and fragments in Au+Au 
collisions at 1 GeV/nucleon and in the final state, i.e. after all collisions have ceased. 
The left column shows the full An acceptance whereas the right column shows the events 
filtered with the acceptance of the FOPI detector (phase I). The different rows depict 
different impact parameter bins. 
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FIG. 2. Discretization of ^^^^^^ for input into the neural network. The columns refer 
to central, medium and peripheral collisions (left to right). The rows show the input for 
diflFerent grid dimensions (top to bottom: 3, 5, 10 and 20). The area of the squares is 
proportional to the number of particles in the bins. 
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FIG. 3. True impact parameter versus the reconstructed impact parameter for full 
events (upper frame) and events processed with the filter of the FOPI experiment (lower 
frame). Each dot represents one event. 
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FIG. 4. Network performance Ab versus input grid dimension n. The upper frame 
compares calculations with and without detector filter. The original boundaries which 
cover the available phase space information in the unfiltered case are marked as symmetric 
binning whereas the boundaries adapted to the detector acceptance of the filtered data-set 
are marked as asymmetric binning. In the lower frame calculations with and without hidden 
units are displayed. 
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FIG. 5. Weight vector mapped onto the representation of the input grid for unfiltered 
(top) and filtered (bottom) input. The weight distribution is directly correlated to the 
physical content of the input. 
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